Goto

Collaborating Authors

 Chihuahua


US reopens airspace over El Paso after claim of cartel drone infiltration

Al Jazeera

What is the US critical minerals stockpile? Has the Trump administration overplayed its spin? United States aviation authorities have announced that the airspace over El Paso, Texas, has been reopened after initially closing the airspace due to an alleged drone incursion from a Mexican cartel. Wednesday's announcement walked back an earlier statement from the Federal Aviation Administration (FAA), abruptly pausing air traffic over the southern border city for 10 days. By late morning, though, the FAA announced that flights would resume in and out of the area as normal, prompting questions about the legitimacy of the drone claims. "The temporary closure of airspace over El Paso has been lifted.


RoadSens-4M: A Multimodal Smartphone & Camera Dataset for Holistic Road-way Analysis

Khandakar, Amith, Michelson, David, Rabbani, Shaikh Golam, Shafi, Fariya Bintay, Ahamed, Md. Faysal, Rahman, Khondokar Radwanur, Rahman, Md Abidur, Nabi, Md. Fahmidun, Ayari, Mohamed Arselene, Khan, Khaled, Suganthan, Ponnuthurai Nagaratnam

arXiv.org Artificial Intelligence

It's important to monitor road issues such as bumps and potholes to enhance safety and improve road conditions. Smartphones are equipped with various built - in sensors that offer a cost - effective and straightforward way to assess road quality. However, prog ress in this area has been slow due to the lack of high - quality, standardized datasets. This paper discusses a new dataset created by a mobile app that collects sensor data from devices like GPS, accelerometers, gyroscopes, magnetometers, gravity sensors, and orientation sensors. This dataset is one of the few that integrates Geographic Information System (GIS) data with weather information and video footage of road conditions, providing a comprehensive understanding of road issues with geographic context . The dataset allows for a clearer analysis of road conditions by compiling essential data, including vehicle speed, acceleration, rotation rates, and magnetic field intensity, along with the visual and spatial context provided by GIS, weather, and video dat a. Its goal is to provide funding for initiatives that enhance traffic management, infrastructure development, road safety, and urban planning . Additionally, the dataset will be publicly accessible to promote further research and innovation in smart transp ortation systems.


VocalBench-DF: A Benchmark for Evaluating Speech LLM Robustness to Disfluency

Liu, Hongcheng, Hou, Yixuan, Liu, Heyang, Wang, Yuhao, Wang, Yanfeng, Wang, Yu

arXiv.org Artificial Intelligence

While Speech Large Language Models (Speech-LLMs) show strong performance in many applications, their robustness is critically under-tested, especially to speech disfluency. Existing evaluations often rely on idealized inputs, overlooking common disfluencies, particularly those associated with conditions like Parkinson's disease. This work investigates whether current Speech-LLMs can maintain performance when interacting with users who have speech impairments. To facilitate this inquiry, we introduce VocalBench-DF, a framework for the systematic evaluation of disfluency across a multi-dimensional taxonomy. Our evaluation of 22 mainstream Speech-LLMs reveals substantial performance degradation, indicating that their real-world readiness is limited. Further analysis identifies phoneme-level processing and long-context modeling as primary bottlenecks responsible for these failures. Strengthening recognition and reasoning capability from components and pipelines can substantially improve robustness. These findings highlight the urgent need for new methods to improve disfluency handling and build truly inclusive Speech-LLMs


Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language

C, Jesus Alvarez, Karajeanes, Daua D., Prado, Ashley Celeste, Ruttan, John, Yang, Ivory, O'Brien, Sean, Sharma, Vasu, Zhu, Kevin

arXiv.org Artificial Intelligence

The digital exclusion of endangered languages remains a critical challenge in NLP, limiting both linguistic research and revitalization efforts. This study introduces the first computational investigation of Comanche, an Uto-Aztecan language on the verge of extinction, demonstrating how minimal-cost, community-informed NLP interventions can support language preservation. We present a manually curated dataset of 412 phrases, a synthetic data generation pipeline, and an empirical evaluation of GPT-4o and GPT-4o-mini for language identification. Our experiments reveal that while LLMs struggle with Comanche in zero-shot settings, few-shot prompting significantly improves performance, achieving near-perfect accuracy with just five examples. Our findings highlight the potential of targeted NLP methodologies in low-resource contexts and emphasize that visibility is the first step toward inclusion. By establishing a foundation for Comanche in NLP, we advocate for computational approaches that prioritize accessibility, cultural sensitivity, and community engagement.


EXCLAIM: An Explainable Cross-Modal Agentic System for Misinformation Detection with Hierarchical Retrieval

Wu, Yin, Zhang, Zhengxuan, Wang, Fuling, Luo, Yuyu, Xiong, Hui, Tang, Nan

arXiv.org Artificial Intelligence

Misinformation continues to pose a significant challenge in today's information ecosystem, profoundly shaping public perception and behavior. Among its various manifestations, Out-of-Context (OOC) misinformation is particularly obscure, as it distorts meaning by pairing authentic images with misleading textual narratives. Existing methods for detecting OOC misinformation predominantly rely on coarse-grained similarity metrics between image-text pairs, which often fail to capture subtle inconsistencies or provide meaningful explainability. While multi-modal large language models (MLLMs) demonstrate remarkable capabilities in visual reasoning and explanation generation, they have not yet demonstrated the capacity to address complex, fine-grained, and cross-modal distinctions necessary for robust OOC detection. To overcome these limitations, we introduce EXCLAIM, a retrieval-based framework designed to leverage external knowledge through multi-granularity index of multi-modal events and entities. Our approach integrates multi-granularity contextual analysis with a multi-agent reasoning architecture to systematically evaluate the consistency and integrity of multi-modal news content. Comprehensive experiments validate the effectiveness and resilience of EXCLAIM, demonstrating its ability to detect OOC misinformation with 4.3% higher accuracy compared to state-of-the-art approaches, while offering explainable and actionable insights.


Zero-Shot Image-Based Large Language Model Approach to Road Pavement Monitoring

Xu, Shuoshuo, Zhao, Kai, Loney, James, Li, Zili, Visentin, Andrea

arXiv.org Artificial Intelligence

Effective and rapid evaluation of pavement surface condition is critical for prioritizing maintenance, ensuring transportation safety, and minimizing vehicle wear and tear. While conventional manual inspections suffer from subjectivity, existing machine learning-based methods are constrained by their reliance on large and high-quality labeled datasets, which require significant resources and limit adaptability across varied road conditions. The revolutionary advancements in Large Language Models (LLMs) present significant potential for overcoming these challenges. In this study, we propose an innovative automated zero-shot learning approach that leverages the image recognition and natural language understanding capabilities of LLMs to assess road conditions effectively. Multiple LLM-based assessment models were developed, employing prompt engineering strategies aligned with the Pavement Surface Condition Index (PSCI) standards. These models' accuracy and reliability were evaluated against official PSCI results, with an optimized model ultimately selected. Extensive tests benchmarked the optimized model against evaluations from various levels experts using Google Street View road images. The results reveal that the LLM-based approach can effectively assess road conditions, with the optimized model -employing comprehensive and structured prompt engineering strategies -outperforming simpler configurations by achieving high accuracy and consistency, even surpassing expert evaluations. Moreover, successfully applying the optimized model to Google Street View images demonstrates its potential for future city-scale deployments. These findings highlight the transformative potential of LLMs in automating road damage evaluations and underscore the pivotal role of detailed prompt engineering in achieving reliable assessments.


Compare without Despair: Reliable Preference Evaluation with Generation Separability

Ghosh, Sayan, Srinivasan, Tejas, Swayamdipta, Swabha

arXiv.org Artificial Intelligence

Human evaluation of generated language through pairwise preference judgments is pervasive. However, under common scenarios, such as when generations from a model pair are very similar, or when stochastic decoding results in large variations in generations, it results in inconsistent preference ratings. We address these challenges by introducing a meta-evaluation measure, separability, which estimates how suitable a test instance is for pairwise preference evaluation. For a candidate test instance, separability samples multiple generations from a pair of models, and measures how distinguishable the two sets of generations are. Our experiments show that instances with high separability values yield more consistent preference ratings from both human- and auto-raters. Further, the distribution of separability allows insights into which test benchmarks are more valuable for comparing models. Finally, we incorporate separability into ELO ratings, accounting for how suitable each test instance might be for reliably ranking LLMs. Overall, separability has implications for consistent, efficient and robust preference evaluation of LLMs with both human- and auto-raters.


SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

Qi, Peng, Yan, Zehong, Hsu, Wynne, Lee, Mong Li

arXiv.org Artificial Intelligence

Misinformation is a prevalent societal issue due to its potential high risks. Out-of-context (OOC) misinformation, where authentic images are repurposed with false text, is one of the easiest and most effective ways to mislead audiences. Current methods focus on assessing image-text consistency but lack convincing explanations for their judgments, which is essential for debunking misinformation. While Multimodal Large Language Models (MLLMs) have rich knowledge and innate capability for visual reasoning and explanation generation, they still lack sophistication in understanding and discovering the subtle crossmodal differences. In this paper, we introduce SNIFFER, a novel multimodal large language model specifically engineered for OOC misinformation detection and explanation. SNIFFER employs two-stage instruction tuning on InstructBLIP. The first stage refines the model's concept alignment of generic objects with news-domain entities and the second stage leverages language-only GPT-4 generated OOC-specific instruction data to fine-tune the model's discriminatory powers. Enhanced by external tools and retrieval, SNIFFER not only detects inconsistencies between text and image but also utilizes external knowledge for contextual verification. Our experiments show that SNIFFER surpasses the original MLLM by over 40% and outperforms state-of-the-art methods in detection accuracy. SNIFFER also provides accurate and persuasive explanations as validated by quantitative and human evaluations.


EXTRACTER: Efficient Texture Matching with Attention and Gradient Enhancing for Large Scale Image Super Resolution

Reyes-Saldana, Esteban, Rivera, Mariano

arXiv.org Artificial Intelligence

Recent Reference-Based image super-resolution (RefSR) has improved SOTA deep methods introducing attention mechanisms to enhance low-resolution images by transferring high-resolution textures from a reference high-resolution image. The main idea is to search for matches between patches using LR and Reference image pair in a feature space and merge them using deep architectures. However, existing methods lack the accurate search of textures. They divide images into as many patches as possible, resulting in inefficient memory usage, and cannot manage large images. Herein, we propose a deep search with a more efficient memory usage that reduces significantly the number of image patches and finds the $k$ most relevant texture match for each low-resolution patch over the high-resolution reference patches, resulting in an accurate texture match. We enhance the Super Resolution result adding gradient density information using a simple residual architecture showing competitive metrics results: PSNR and SSMI.


Dialogs Re-enacted Across Languages

Ward, Nigel G., Avila, Jonathan E., Rivas, Emilia, Marco, Divette

arXiv.org Artificial Intelligence

For example, you might say: The purpose of this data collection is to further speech-to-speech translation research by creating an open collection of translated conversations, something that has not been done before. Today you will have a conversation with your partner in one language, then re-enact parts of it in another language. I will select some snippets of the audio and replay 7 them for you to translate re-record in the other language. It is important that you try your best to make it sound natural while also keeping the same feeling as in the original. Try to recreate pauses, laughs, long breaths, or anything of that sort during the second recording if possible. I can replay the audio as many times as you need, and give you as much time as you need to translate. If either of us feel like you can translate the words better or if the prosody was not as faithful in feeling as it could be, then we can redo it until we are satisfied. Please be vocal of any opinions that you have about the process and ask any questions that may arise.